智能论文笔记

Learning Representations for Hyper-Relational Knowledge Graphs

Harry Shomer , Wei Jin , Juanhui Li , Yao Ma , Jiliang Tang

分类：机器学习 | 人工智能

2022-08-30

知识图（kgs）因其学习单一关系事实的表示能力而获得了突出。最近，研究重点是建模超级关系的事实，这些事实超出了单一关系事实的限制，使我们能够代表更复杂和现实的信息。但是，现有的超级关系中学习表征的方法主要集中于增强从预选赛到基础三元组的沟通，同时忽略了从基本三重限制者到资格赛的信息流。这可能会导致次级预选赛表示，尤其是在提出大量预选赛时。它促使我们设计一个利用多个聚合器来学习超级关系事实的表示框架：从基本三重的角度来看，一个框架从资格符的角度来看。实验证明了我们框架对多个数据集的超相关知识图完成的有效性。此外，我们进行了一项消融研究，以验证各个组件在我们的框架中的重要性。可以在\ url {https://github.com/harryshomer/quad}找到复制我们的结果的代码。

translated by 谷歌翻译

HTML版本

Are Graph Neural Networks Really Helpful for Knowledge Graph Completion?

Juanhui Li , Harry Shomer , Jiayuan Ding , Yiqi Wang , Yao Ma , Neil Shah , Jiliang Tang , Dawei Yin

分类：人工智能 | 机器学习

2022-05-21

知识图（kgs）由于能够存储适用于许多领域的关系知识的能力，因此有助于多种应用。尽管在创造和维护方面进行了巨大的努力，但即使是最大的公斤也远非完整。因此，KG完成（KGC）已成为KG研究最关键的任务之一。最近，该领域的大量文献围绕着使用图神经网络（GNN）学习强大的嵌入，从而利用KGS中的拓扑结构。具体而言，已经做出了专门的努力，以扩展GNN，通常是为简单的同质和单一相关图设计的，以通过设计更复杂的聚合方案而不是相邻节点（关键的节点）（通过设计更复杂的聚合方案）（为GNN绩效）适当利用多关系信息。这些方法的成功自然归因于GNN在简单的多层感知器（MLP）模型上使用，这是由于它们的附加聚合功能。在这项工作中，我们发现简单的MLP模型能够达到与GNN的可比性能，这表明聚集可能并不像以前那样重要。通过进一步的探索，我们显示出仔细的评分功能和损失功能设计对KGC模型性能的影响要大得多，并且实际上不需要聚集。这表明了评分功能设计，损失功能设计和先前工作中的聚集结合，并有很有希望的见解当今最先进的KGC方法的可伸缩性，以及对KGC任务更合适的聚合设计的仔细注意明天。该实现可在线获得：https：//github.com/juanhui28/are_mpnns_helpful。

translated by 谷歌翻译

Rapid Extraction of Respiratory Waveforms from Photoplethysmography: A Deep Encoder Approach

Harry J. Davies , Danilo P. Mandic

分类：机器学习

2022-12-22

Much of the information of breathing is contained within the photoplethysmography (PPG) signal, through changes in venous blood flow, heart rate and stroke volume. We aim to leverage this fact, by employing a novel deep learning framework which is a based on a repurposed convolutional autoencoder. Our model aims to encode all of the relevant respiratory information contained within photoplethysmography waveform, and decode it into a waveform that is similar to a gold standard respiratory reference. The model is employed on two photoplethysmography data sets, namely Capnobase and BIDMC. We show that the model is capable of producing respiratory waveforms that approach the gold standard, while in turn producing state of the art respiratory rate estimates. We also show that when it comes to capturing more advanced respiratory waveform characteristics such as duty cycle, our model is for the most part unsuccessful. A suggested reason for this, in light of a previous study on in-ear PPG, is that the respiratory variations in finger-PPG are far weaker compared with other recording locations. Importantly, our model can perform these waveform estimates in a fraction of a millisecond, giving it the capacity to produce over 6 hours of respiratory waveforms in a single second. Moreover, we attempt to interpret the behaviour of the kernel weights within the model, showing that in part our model intuitively selects different breathing frequencies. The model proposed in this work could help to improve the usefulness of consumer PPG-based wearables for medical applications, where detailed respiratory information is required.

translated by 谷歌翻译

Deep Unfolded Tensor Robust PCA with Self-supervised Learning

Harry Dong , Megna Shah , Sean Donegan , Yuejie Chi

分类： (统计)机器学习 | 机器学习

2022-12-21

Tensor robust principal component analysis (RPCA), which seeks to separate a low-rank tensor from its sparse corruptions, has been crucial in data science and machine learning where tensor structures are becoming more prevalent. While powerful, existing tensor RPCA algorithms can be difficult to use in practice, as their performance can be sensitive to the choice of additional hyperparameters, which are not straightforward to tune. In this paper, we describe a fast and simple self-supervised model for tensor RPCA using deep unfolding by only learning four hyperparameters. Despite its simplicity, our model expunges the need for ground truth labels while maintaining competitive or even greater performance compared to supervised deep unfolding. Furthermore, our model is capable of operating in extreme data-starved scenarios. We demonstrate these claims on a mix of synthetic data and real-world tasks, comparing performance against previously studied supervised deep unfolding methods and Bayesian optimization baselines.

translated by 谷歌翻译

AI applications in forest monitoring need remote sensing benchmark datasets

Emily R. Lines , Matt Allen , Carlos Cabo , Kim Calders , Amandine Debus , Stuart W. D. Grieve , Milto Miltiadou , Adam Noach , Harry J. F. Owen , Stefano Puliti

分类：人工智能

2022-12-20

With the rise in high resolution remote sensing technologies there has been an explosion in the amount of data available for forest monitoring, and an accompanying growth in artificial intelligence applications to automatically derive forest properties of interest from these datasets. Many studies use their own data at small spatio-temporal scales, and demonstrate an application of an existing or adapted data science method for a particular task. This approach often involves intensive and time-consuming data collection and processing, but generates results restricted to specific ecosystems and sensor types. There is a lack of widespread acknowledgement of how the types and structures of data used affects performance and accuracy of analysis algorithms. To accelerate progress in the field more efficiently, benchmarking datasets upon which methods can be tested and compared are sorely needed. Here, we discuss how lack of standardisation impacts confidence in estimation of key forest properties, and how considerations of data collection need to be accounted for in assessing method performance. We present pragmatic requirements and considerations for the creation of rigorous, useful benchmarking datasets for forest monitoring applications, and discuss how tools from modern data science can improve use of existing data. We list a set of example large-scale datasets that could contribute to benchmarking, and present a vision for how community-driven, representative benchmarking initiatives could benefit the field.

translated by 谷歌翻译

Audio-based AI classifiers show no evidence of improved COVID-19 screening over simple symptoms checkers

Harry Coppock , George Nicholson , Ivan Kiskin , Vasiliki Koutra , Kieran Baker , Jobie Budd , Richard Payne , Emma Karoune , David Hurley , Alexander Titcomb

分类：机器学习

2022-12-15

Recent work has reported that AI classifiers trained on audio recordings can accurately predict severe acute respiratory syndrome coronavirus 2 (SARSCoV2) infection status. Here, we undertake a large scale study of audio-based deep learning classifiers, as part of the UK governments pandemic response. We collect and analyse a dataset of audio recordings from 67,842 individuals with linked metadata, including reverse transcription polymerase chain reaction (PCR) test outcomes, of whom 23,514 tested positive for SARS CoV 2. Subjects were recruited via the UK governments National Health Service Test-and-Trace programme and the REal-time Assessment of Community Transmission (REACT) randomised surveillance survey. In an unadjusted analysis of our dataset AI classifiers predict SARS-CoV-2 infection status with high accuracy (Receiver Operating Characteristic Area Under the Curve (ROCAUC) 0.846 [0.838, 0.854]) consistent with the findings of previous studies. However, after matching on measured confounders, such as age, gender, and self reported symptoms, our classifiers performance is much weaker (ROC-AUC 0.619 [0.594, 0.644]). Upon quantifying the utility of audio based classifiers in practical settings, we find them to be outperformed by simple predictive scores based on user reported symptoms.

translated by 谷歌翻译

Statistical Design and Analysis for Robust Machine Learning: A Case Study from COVID-19

Davide Pigoli , Kieran Baker , Jobie Budd , Lorraine Butler , Harry Coppock , Sabrina Egglestone , Steven G. Gilmour , Chris Holmes , David Hurley , Radka Jersakova

分类：机器学习

2022-12-15

Since early in the coronavirus disease 2019 (COVID-19) pandemic, there has been interest in using artificial intelligence methods to predict COVID-19 infection status based on vocal audio signals, for example cough recordings. However, existing studies have limitations in terms of data collection and of the assessment of the performances of the proposed predictive models. This paper rigorously assesses state-of-the-art machine learning techniques used to predict COVID-19 infection status based on vocal audio signals, using a dataset collected by the UK Health Security Agency. This dataset includes acoustic recordings and extensive study participant meta-data. We provide guidelines on testing the performance of methods to classify COVID-19 infection status based on acoustic features and we discuss how these can be extended more generally to the development and assessment of predictive methods based on public health datasets.

translated by 谷歌翻译

A large-scale and PCR-referenced vocal audio dataset for COVID-19

Jobie Budd , Kieran Baker , Emma Karoune , Harry Coppock , Selina Patel , Ana Tendero Cañadas , Alexander Titcomb , Richard Payne , David Hurley , Sabrina Egglestone

分类：机器学习

2022-12-15

The UK COVID-19 Vocal Audio Dataset is designed for the training and evaluation of machine learning models that classify SARS-CoV-2 infection status or associated respiratory symptoms using vocal audio. The UK Health Security Agency recruited voluntary participants through the national Test and Trace programme and the REACT-1 survey in England from March 2021 to March 2022, during dominant transmission of the Alpha and Delta SARS-CoV-2 variants and some Omicron variant sublineages. Audio recordings of volitional coughs, exhalations, and speech were collected in the 'Speak up to help beat coronavirus' digital survey alongside demographic, self-reported symptom and respiratory condition data, and linked to SARS-CoV-2 test results. The UK COVID-19 Vocal Audio Dataset represents the largest collection of SARS-CoV-2 PCR-referenced audio recordings to date. PCR results were linked to 70,794 of 72,999 participants and 24,155 of 25,776 positive cases. Respiratory symptoms were reported by 45.62% of participants. This dataset has additional potential uses for bioacoustics research, with 11.30% participants reporting asthma, and 27.20% with linked influenza PCR test results.

translated by 谷歌翻译

Autonomous Apple Fruitlet Sizing and Growth Rate Tracking using Computer Vision

Harry Freeman , Mohamad Qadri , Abhisesh Silwal , Paul O'Connor , Zachary Rubinstein , Daniel Cooley , George Kantor

分类：机器人 | 计算机视觉

2022-12-03

Measuring growth rates of apple fruitlets is important because it allows apple growers to determine when to apply chemical thinners to their crops to optimize yield. The current practice of obtaining growth rates involves using calipers to record sizes of fruitlets across multiple days. Due to the number of fruitlets needed to be sized, this method is laborious, time-consuming, and prone to human error. In this paper, we present a computer vision approach to measure the sizes and growth rates of apple fruitlets. With images collected by a hand-held stereo camera, our system detects, segments, and fits ellipses to fruitlets to measure their diameters. To measure growth rates, we utilize an Attentional Graph Neural Network to associate fruitlets across different days. We provide quantitative results on data collected in an apple orchard, and demonstrate that our system is able to predict abscise rates within 3% of the current method with a 7 times improvement in speed, while requiring significantly less manual effort. Moreover, we provide results on images captured by a robotic system in the field, and discuss the next steps to make the process fully autonomous.

translated by 谷歌翻译

sEHR-CE: Language modelling of structured EHR data for efficient and generalizable patient cohort expansion

Anna Munoz-Farre , Harry Rose , Sera Aylin Cakiroglu

分类：自然语言处理 | 机器学习

2022-11-30

Electronic health records (EHR) offer unprecedented opportunities for in-depth clinical phenotyping and prediction of clinical outcomes. Combining multiple data sources is crucial to generate a complete picture of disease prevalence, incidence and trajectories. The standard approach to combining clinical data involves collating clinical terms across different terminology systems using curated maps, which are often inaccurate and/or incomplete. Here, we propose sEHR-CE, a novel framework based on transformers to enable integrated phenotyping and analyses of heterogeneous clinical datasets without relying on these mappings. We unify clinical terminologies using textual descriptors of concepts, and represent individuals' EHR as sections of text. We then fine-tune pre-trained language models to predict disease phenotypes more accurately than non-text and single terminology approaches. We validate our approach using primary and secondary care data from the UK Biobank, a large-scale research study. Finally, we illustrate in a type 2 diabetes use case how sEHR-CE identifies individuals without diagnosis that share clinical characteristics with patients.

translated by 谷歌翻译